Voiced fricative synthesizer



United States Patent b PULSE GENERATOR [56] References Cited UNITEDSTATES PATENTS 3,395,249 7/1968 Clapper 179/1 3,190,963 6/1965 David etal 179/1 Primary Examiner-Kathleen H. Claffy Assistant Examiner-DouglasW. Olms Attorneys-R. J. Guenther and William L. Keefauver ABSTRACT: Aspeech synthesizer is disclosed in which voiced fricatives aresynthesized from a voiced component and an unvoiced component, theunvoiced component being generated only during those times the voicedcomponent exceeds a selected amplitude.

CONTROL s/smu FORMA N7 NETWORK CONTROL 4 67 GNALS IIISONATUR l6 9% spermVOICED FRICATIV 1E SYNTHESIZER BACKGROUND OF THE INVENTION Thisinvention relates to the synthesis of speech and, in particular, to thesynthesis of voiced fricatives.

Speech synthesizers are devices which produce speech from .coded signalsrepresenting the fundamental components of speech. Best known arevocoder synthesizers which produce replicas of input speech fromnarrowband control signals derived from the input speech. But recently,speech synthesiz'ers for so-called voice response systems," whichprovide artificially generated spoken responses to coded requests forinformation, have become of increasing interest.

Most current synthesizers produce speech of low quality. Partly, this isbecause a high quality synthesizer must reproduce accurately thecomplicated process by which a person speaks. This process is notcompletely understood. As a result, synthesized speech is often: highlyunnatural with rough transitions between sounds, unrealistic sounddurations, and

disconcerting background noise.

A particularly difficult sound to synthesize has been the voicedfricative. Voiced fricatives are produced by passing quasi-periodicpuffs of air, released from the lungs by vibrating vocal cords, throughconstrictions in the vocal tract. For a volume flow of air above acertain minimum flow, the air flow through the constriction becomesturbulent. Thus, an analysis of voiced fricatives shows that theycontain both turbulent noise energy-the so-called unvoiced component-andquasiperiodic energy-the so-called voiced component. An accuratesynthesis of voiced fricatives must reproduce both of these components.I

l SUMMARY or THE INVENTION This invention improves the quality ofsynthesized voiced frieatives by accurately simulating the processes inthe human vocal tract which produce both the unvoiced and voicedcomponents of voiced fricatives. As a result, the synthesized voicedfricatives sound highly natural.

According to this invention, quasi-periodic pulses from a pulse at adesired fundamental frequency are sent along two parallel paths. In onepath, these pulses are passed through a formant network to produce asignal representing the voiced component of a voiced fricative. In theother path, these pulses are first shaped to resemble, as a function oftime, the air flow through the vocal cords. A threshold circuit and ahalfwave rectifier pass only those portions of the shaped pulses whichexceed a selected amplitude. The resulting rectified pulse portions thenmodulate the output signal from a noise generator. Finally, a spectralshaping network produces, from the modulated noise signal, a signalrepresenting the unvoiced component of the voiced fricative. The voicedfricative is synthesized by summing the output signals from the firstand second paths and passing the resulting sum signal through aloudspeaker.

The threshold circuit and the half-wave rectifier simulate the fact thatturbulence is introduced by the constriction only when the volume flowof air exceeds a certain minimum value. Otherwise, no turbulence isproduced. Hence the synthesizer of this invention duplicates the actualmechanism by which the vocal tract produces voiced fricatives.

This invention may be more fully understood from the following detaileddescription taken together with the drawings.

BRIEF DESCRIPTION OF THE DRAWING FIG. 1 shows in schematic from thenetwork used in this invention to produce voiced fricatives; and

FIG. 2 shows the shape of certain pulses at selected points in thenetwork of FIG. 1.

DETAILED DESCRIPTION Linguists consider speech to be composed of avariety of basic sounds called phonemes. Phonemes themselves are furthersubdivided according to their manner and place of production within thevocal tract. In general, as explained by J. L. Flanagan in his bookSpeech Analysis, Synthesis and Perception," Academic Press Inc 1965,Chapter 1, English speech (American dialect) is'composed of vowels,normally produced exclusively by vocalcord excitation of .the vocaltract, and consonants, both voiced and unvoiced, produced by variouscombinations. of quasi-periodic and incoherent excitation of the vocaltract. In particular, the voiced fricatives are consonants produced bycombinationsof vocal cord and noise excitation of the vocal tract. 3

As explained by Flanagan, sounds are produced by expelling air from thelungs along the trachea through the vocal cords into the vocal tract andout of the mouth. The vocal tract is an irregular tubewhose'cross-sectional area is controlled by movement of the lips, jaw,and velum-so-called soft palate-which controls access to the nasalcavity. When the velum is open, sounds radiate from both the mouth andthe nostrils.

To produce a voiced fricative, quasi-periodic puffs of air, releasedfrom the lungs by the vocal cords, are passed through a constrictioncreated at some point in the vocal tract, by, for example, the tongue.For a volume flow of air above a critical minimum value, theconstriction introduces turbulence, with the result that the voicedfricative. contains both quasiperiodic and incoherent energy components.An accurate synthesis of a-voice fricative, therefore, musttake intoconsideration both the vocal cord quasi-periodic excitation and theincoherent noise noiselike energy superimposed on this excitation by thevocal tract constrictions. A system to do this is shown in FIG. 1. a

Pulse generator 1 produces pulses, shown on line 1, FIG. 2, at aselected frequency F which, in general, can vary with time. The pulsesfrom generator 1 are sent along two paths.

First, they are passed through formant network 2. Network 2 modifies thefrequency components of these pulses by the vocal tract transferfunction so as to produce a signal representing the voiced component ofa voiced fricative with the correct amplitude spectrum. A voicedspectrum synthesizer suitable for use as formant network 2 in thepractice of the invention is described in E. E. David, Jr. et al. U.S.Pat. No. 3,190,963 granted June 22, 1965, particularly with reference toFIG. 5. Generation of control signals for energizing the network isdescribed in the patent, particularly with regard to FIG. 1.

Second, these pulses are passed through shaping filter 3, resonator 4,threshold circuit 14, modulator 10 and network 11 to produce a signalrepresenting the unvoiced component of a voiced fricative.

Filter 3 produces output pulses closely resembling in shape outputpulses from the vocal cords. The shape of these pulses is described byJ. L. Flanagan in an article entitled Some Properties of the GlottalSound Source published in Volume 1, Journal of Speech and HearingResearch, pages 99 to 116 (1958); These shaped output pulses, shown online 2 of FIG. 2, are then passed through resonator 4 which representsthe acoustical characteristics of the vocal tract back cavitythe volumebetween the vocal cords and the point of constriction in the vocaltract. This resonator essentially reproduces the first formant of thespeech spectrum. A typical resonator is described, for example, on. page184 of the abovecited book by Flanagan. FIG. 3A of the David et al. U.S.Pat. No. 3,190,963, shows a similar resonator and the associated textdiscusses the form of suitable control signals and the manner ofgenerating them. Resonators to take into account higher frequencyformants of the speech spectrum are unnecessary because the l2 db. peroctave attenuation of the vocal tract frequency characteristics makesthe amplitudes of the higher frequency formants quite small relative tothe amplitude of the first formant. In addition, the amplitudes of thehigh frequency components of the pulses from the vocal cords are smallerthan the amplitudes of the low frequency components of these pulses.

Now, in any flow constriction, turbulence is not produced until thevolume velocity of the flowing fluid exceeds some threshold value. Thisis true in the vocal tract. Thus, threshold circuit 14, composed ofsumming network 5, together with battery 6, attenuator 7, and half-waverectifier 8, simulates the threshold at which the flow throughaconstriction changes from laminar to turbulent. Half-wave rectifier 8passes only the positive components of the output signal from summingnetwork 5. But summing network 5, in turn, produces a positive outputsignal only when the output signal from resonator 4 exceeds the absolutevalue cfthe bias voltage from battery 6 as adjusted by attenuator 7.Thus, in effect, rectifier 8 passes. only those portions of the pulsesfrom resonator 4 which exceed a selected amplitude. Because the volumeflow at which the flow through the constriction becomes turbulent is afun-.- tion of the diameter of the constriction, attenuator 7 can beadjusted as a function of time, if desired, to represent changes inconstriction size with time.

The output signal from rectifier 8 multiplies the output noise signalfrom noise generator 9 in modulator 10. Modula' tor 10, as a result,produces an output signal only when halfwave rectifier 8 produces anoutput signal. Thus, the output signal from modulator 10 representsquasi-periodic bursts oi noise energy. Fricative network 11 shapes thismodulated noise energy by superimposing the formant structure of theremainder of the vocal tract downstream from the constriction on thisenergy. A suitable unvoiced spectrum synthesizer for generating unvoicedcomponent signals is described in the aforementioned David et al.patent. Summing network 2.2 combines the modulated noise from fricativenetwork 11 with the shaped quasi-periodic energy from formant network 2to produce an electrical signal representing the synthesized voicedfricative. Loudspeaker 13 converts this electrical signal into anacoustic sound.

Of course, the vocal tract is, in general, continuously changing inshape. Thus, both the center frequency and the bandwidth of resonator 4,which produces the shape of the first formant of the vocal tract, arecontrolled as a function of time by control signals from sources notshown. Such control signals might, for example, be derived in a wellknown manner from a table of stored formant data similar to the tableshown on page page 208 of the above-cited book by Flanagan. In addition,the center frequencies and bandwidths of the formants produced byformant network 2 and fricative network 10 are also controlled in thesame manner. Network 2, for example, which produces the shape of aselected number of the formants in the vocal tract, is controlled sothat the center frequencies and bandwidths of the first three formantscan be varied in response to control signals. Network 10 is similarlycontrolled.

Voiced fricatives synthesized by the network shown in FIG. 1 contain aquasi-periodic, or voiced component, and, if the quasi-periodiccomponent exceeds a selected magnitude, an

I aperiodic or unvoiced component. The resulting synthesized voicedfricatives are superior in quality to the voiced fricatives synthesizedby prior art systems. Yet the voiced fricative synthesizer itself isquite simple.

Other embodiments incorporating the principles of this invention will beapparent from this disclosure to those skilled in the art of speechsynthesis.

Iclaim:

1. Apparatus for synthesizing voiced fricative sounds which comprises:

means for producing first and second signals representing the voiced andthe unvoiced components, respectively, of a selected voiced fricative;and

means, supplied with said first signal and, whenever said first signalexceeds a selected amplitude threshold, with said second signal, forcombining said signals to produce a composite signal representing saidselected voiced fricative.

2. Apparatus for the synthesis of voiced fricatives which comprises:

means for producing pulse signals at selected intervals;

means for processing said pulses to produce a first signalrepresentative of the voiced component of a selected voiced fricative;

means for generating a second signal containing noise energy during thetime said pulse signals exceed a selected amplitude and containing noenergy at all other times, said second signal representing the unvoicedcomponent of said voiced fricative; and

means for combining said first and second signals to produce said voicedfricative.

3. Apparatus as in claim 2 in which said means for processing comprisesa network for superimposing on said pulse signals the spectral shape ofthe vocal tract formants.

4. Apparatus as in claim 2 in which said means for generating comprises:

a shaping network for converting said pulse signals into a shaped signalresembling, as a function of time, the air flow through vibrating vocalcords;

a resonator for superimposing on said shaped signal the formantstructure of the back cavity of the vocal tract;

a threshold circuit for passing only those portions of said shapedsignal from said resonator which exceed a selected amplitude;

a noise source;

means for modulating the noise from said source with the output signalsfrom said threshold circuit to produce bursts of noise during the timesaid shaped signal from said resonator exceeds said selected amplitude;and

a fricative network for superimposing on said bursts of noise theformant structure of a selected portion of the vocal tract.

5. Apparatus as in claim 4 in which said threshold circuit comprises:

a battery, the positive terminal of which is connected to ground;

a variable attenuator connected between the negative terminal of saidbattery and ground, said attenuator possessing a movable output tap;and,

a summing network, one input terminal of which is connected to saidmovable output tap from said attenuator and the other input terminal ofwhich is connected to receive the output signals from said resonator,said summing network producing a positive output signal only when theoutput signal from said resonator exceeds the absolute value of the biasvoltage on the output tap of said attenuator; and

a half-wave rectifier for passing the positive output signals from saidsumming network, said positive output signals comprising the outputsignals from said threshold circuit.

